320 research outputs found
Impossibility Theorems for Feature Attribution
Despite a sea of interpretability methods that can produce plausible
explanations, the field has also empirically seen many failure cases of such
methods. In light of these results, it remains unclear for practitioners how to
use these methods and choose between them in a principled way. In this paper,
we show that for moderately rich model classes (easily satisfied by neural
networks), any feature attribution method that is complete and linear -- for
example, Integrated Gradients and SHAP -- can provably fail to improve on
random guessing for inferring model behaviour. Our results apply to common
end-tasks such as characterizing local model behaviour, identifying spurious
features, and algorithmic recourse. One takeaway from our work is the
importance of concretely defining end-tasks: once such an end-task is defined,
a simple and direct approach of repeated model evaluations can outperform many
other complex feature attribution methods.Comment: 36 pages, 4 figures. Significantly expanded experiment
Explaining Latent Factor Models for Recommendation with Influence Functions
Latent factor models (LFMs) such as matrix factorization achieve the
state-of-the-art performance among various Collaborative Filtering (CF)
approaches for recommendation. Despite the high recommendation accuracy of
LFMs, a critical issue to be resolved is the lack of explainability. Extensive
efforts have been made in the literature to incorporate explainability into
LFMs. However, they either rely on auxiliary information which may not be
available in practice, or fail to provide easy-to-understand explanations. In
this paper, we propose a fast influence analysis method named FIA, which
successfully enforces explicit neighbor-style explanations to LFMs with the
technique of influence functions stemmed from robust statistics. We first
describe how to employ influence functions to LFMs to deliver neighbor-style
explanations. Then we develop a novel influence computation algorithm for
matrix factorization with high efficiency. We further extend it to the more
general neural collaborative filtering and introduce an approximation algorithm
to accelerate influence analysis over neural network models. Experimental
results on real datasets demonstrate the correctness, efficiency and usefulness
of our proposed method
Out-of-Domain Robustness via Targeted Augmentations
Models trained on one set of domains often suffer performance drops on unseen
domains, e.g., when wildlife monitoring models are deployed in new camera
locations. In this work, we study principles for designing data augmentations
for out-of-domain (OOD) generalization. In particular, we focus on real-world
scenarios in which some domain-dependent features are robust, i.e., some
features that vary across domains are predictive OOD. For example, in the
wildlife monitoring application above, image backgrounds vary across camera
locations but indicate habitat type, which helps predict the species of
photographed animals. Motivated by theoretical analysis on a linear setting, we
propose targeted augmentations, which selectively randomize spurious
domain-dependent features while preserving robust ones. We prove that targeted
augmentations improve OOD performance, allowing models to generalize better
with fewer domains. In contrast, existing approaches such as generic
augmentations, which fail to randomize domain-dependent features, and
domain-invariant augmentations, which randomize all domain-dependent features,
both perform poorly OOD. In experiments on three real-world datasets, we show
that targeted augmentations set new states-of-the-art for OOD performance by
3.2-15.2%.Comment: ICML camera read
- …